Cluster Stability and the Use of Noise in Interpretation of Clustering

نویسندگان

  • George S. Davidson
  • Brian N. Wylie
  • Kevin W. Boyack
چکیده

A clustering and ordination algorithm suitable for mining extremely large databases, including those produced by microarray expression studies, is described and analyzed for stability. Data from a yeast cell cycle experiment with 6000 genes and 18 experimental measurements per gene are used to test this algorithm under practical conditions. The process of assigning database objects to an X,Y coordinate, ordination, is shown to be stable with respect to random starting conditions, and with respect to minor perturbations in the starting similarity estimates. Careful analysis of the way clusters typically co-locate, versus the occasional large displacements under different starting conditions are shown to be useful in interpreting the data. This extra stability information is lost when only a single cluster is reported, which is currently the accepted practice. However, it is believed that the approaches presented here should become a standard part of best practices in analyzing computer clustering of large data collections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm

Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...

متن کامل

Oil Reservoirs Classification Using Fuzzy Clustering (RESEARCH NOTE)

Enhanced Oil Recovery (EOR) is a well-known method to increase oil production from oil reservoirs. Applying EOR to a new reservoir is a costly and time consuming process. Incorporating available knowledge of oil reservoirs in the EOR process eliminates these costs and saves operational time and work. This work presents a universal method to apply EOR to reservoirs based on the available data by...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

A new ensemble clustering method based on fuzzy cmeans clustering while maintaining diversity in ensemble

An ensemble clustering has been considered as one of the research approaches in data mining, pattern recognition, machine learning and artificial intelligence over the last decade. In clustering, the combination first produces several bases clustering, and then, for their aggregation, a function is used to create a final cluster that is as similar as possible to all the cluster bundles. The inp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001